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io Abstract 

17 In this paper we review recent advances in Stable Isotope Mixing Models (SIMMs) and place them 

is into an over-arching Bayesian statistical framework which allows for several useful extensions. SIMMs 

19 are used to quantify the proportional contributions of various sources to a mixture. The most widely 

20 used application is quantifying the diet of organisms based on the food sources they have been observed 

21 to consume. At the centre of the multivariate statistical model we propose is a compositional mixture of 

22 the food sources corrected for various metabolic factors. The compositional component of our model is 

23 based on the isometric log ratio (ilr) transform of |Egozcue et al.] ( |2003[ ). Through this transform we can 

24 apply a range of time series and non-parametric smoothing relationships. We illustrate our models with 
t-H 25 3 case studies based on real animal dietary behaviour. 

> 

^ 26 1 Introduction 

H 

27 Stable isotope analysis is an increasingly important tool in the study of ecological food webs. The technique 

28 utilises the fact that biologically active elements exist in more than one isotopic form. Generally the lighter 

29 isotopic form is much more abundant in the environment than the heavier form, although their relative 

30 abundance is altered by a range of biological, geochemical and anthropogenic processes. These processes 

31 produce isotopic gradients which are reflected in the tissues of plants and animals. Differences in relative 

32 abundance of these isotopes within a particular sample can be measured using a mass spectrometer and 

33 expressed as the ratio of heavy to light form, which can then be standardised against international reference 

34 samples and reported in the delta (<5) notation as parts per thousand or per mil (%o). 

35 
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As a consumer's tissues are ultimately derived from the dietary sources they consume, it is possible to use 
stable isotope mixing models (SIMMs) to derive the assimilated diet of an individual, or a group of indi- 



viduals, given the isotopic ratios of the consumers' tissues and food sources (Phillips 20121. A number 



of recent papers have proposed models to analyse such data, gathering over 1500 citations since their first 
introduction. More recently, the models proposed for such data are Bayesian. In this paper we review the 
different models proposed and bring them into an over-arching framework. We include three case studies 



ranging from the simple to the complex, together with JAGS (Just Another Gibbs Sampler; Plummer 2003 1 
code for their implementatioij^] 

Generally the isotopic ratios of a sample of a consumer's tissues (e.g. blood, feathers, whiskers) are mea- 
sured along with a representative sample of potential items from a consumer's diet. The consumer isotopic 
values are represented in the model as the convex combination of the source values where the coefficients 
in the simplex are the 'dietary proportions'; strictly speaking they are the proportion of the consumers' 
dietary proteins obtained from the sources. Estimation of these dietary proportions is the main focus of 
our analysis. Most commonly the isotopic observations on the consumers and sources are multivariate with 



dimension 2. A thorough description of the uses of stable isotopes can be found in Inger and Bearhop ( 2008 1 



Once the isotopic data have been collected for both consumer and sources, it is usual to create an iso-space 
plot which shows the consumer and source values. An example is shown in Figure [l] It is desirable for the 
consumer values to lie within the fuzzy convex hull of the sources. However, a further phenomenon is often 
observed here, that of trophic enrichment, whereby light isotopes are lost during the conversion of source 
proteins into consumer tissues. The isotopic values of the consumer (or equivalently the sources) are thus 
adjusted by a trophic enrichment factor (TEF) which may vary by food source and consumer. These TEF 
corrections arise from laboratory studies, and thus contribute another set of (uncertain) data to our analysis. 



The inference challenge involved in a SIMM is thus to estimate the dietary proportions whilst taking account 
of the uncertainty in the source and TEF values. Clearly not all of the consumers will eat exactly the same 
diet, so it is common to use a hierarchical model. Furthermore, covariates such as time or age may be 
available which are thought to influence the dietary proportions. The model we present in this paper takes 
account of these common features in a multivariate hierarchical Bayesian model. 

The paper is organised as follows. In Section|2]we introduce our general SIMM and show it may be extended 
to form part of a larger class of hierarchical compositional generalised linear models. In Section [3] we outline 
previous work in this area and show how each of these fits into our general SIMM. Section [4] outlines the 
statistical issues concerning the formulation of such models and how they can be fitted. We outline three 
case studies in Section [5j showing how the model works in different situations depending on the information 
available. We discuss future directions in Section [6] 



2 Model formulation 

We first provide the notation we use to formulate our model. We suppose that there are N consumers on J 
isotopes, with K sources. We outline the most important components of our model below: 

• Yij represents the isotope measurement on consumer i for isotope j. We write Yi as the J- vector of 
isotope values for consumer i 

1 See mathsci.ucd.ie/~parnell_a/ 
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Figure 1: Isospace plot of the geese data of case study 1 (see Section 5.1). The consumer information is 
shown in the black filled circles, whilst the sources are shown as contours (90% range) and filled ellipses 
(50% range). The consumers seem to lie close to the Zostera source so this is likely to form a substantial 
part of their diet. 



• Sijk is the source value for consumer i on isotope j and source k. We write Sik to be the J-vector of 
isotope source values for consumer i on source k, and Si to be the J x K matrix of source values for 
consumer i. 

• Cijk is the TEF value for consumer i on isotope j and source k. We write Cjfc to be the J-vector of TEF 
values for consumer i on source fc, and as the J x K matrix of TEF values related to consumer i. 

• Pik is the dietary contribution of source k for consumer i. p i is the if- vector of dietary proportions for 
consumer i. Estimation of these dietary proportions is the main focus of our analysis. 

• eijk is a random noise term representing residual variation. We write ej as the J-vector of residual 
terms for consumer i, and set ~ N(0, S) with S a covariance matrix. 



so Using the notation above, we write our general model as: 

Yi-Nipfisi + a),!;). 



(i) 



We assume a hierarchical formulation so that Sik ~ N(^L s k , S|) and Cj£ ~ N{n^ where the means and 
covariances of the sources and TEFs are estimated from the source and TEF experimental data. 



Of particular interest is the modelling structure for the dietary proportions p. We use an isometric log-ratio 



(ilr) approach as proposed by lEgozcue et al. ( 2003 ) , though other transformations are available (see next 



two sections for further discussion). The transformation is written as: 



</>. t =ilr( Pi ) = V T log 



Pn 
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with g (p, ) = 
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with V a K — 1 x K matrix of orthonormal basis functions on the simplex. The inverse transformation 
p i — ilr~ 1 (</> i ) simply involves exponentiating and re-normalising the values. There are two consequences 
of working with the ilr. The first is that we now work in a K — 1 dimensional space. The second is that 
there is no obvious link between the elements of fa/, and Pik , so we lose some degree of interpretability. We 
further parameterise the transformed proportions so that (pik ~ Nfyk, Kk) with Kk quantifying a random 
effect variance. Were we to use the centred log-ratio (clr) transform (equivalent to setting V = J; though 
see Section [I] as to why we might not do this) then Kk would represent the consumer-level variance in diet 
for source fc. Finally we set 7^ to be a mean term restricted in some fashion to allow for estimation. A 
multivariate prior for <f) i would also be feasible. 

In situations where covariates Xi are available, they are usually linked to the model through the dietary 
proportions. The covariates may take the form of age, sex, time or any other variables upon which diet 
is expected to depend. In certain cases (such as our third case study) both the diet and the sources are 
expected to be functions of a covariate. In the simpler case we apply the covariates by making 7^. functions 
of Xi. In the more advanced case we additionally apply them to /jti and . 

We term the TEF data set T> c and the source data set T> s , each consisting of collections of J- vectors of 
isotope ratios for each of the K sources. A full Bayesian posterior distribution gives: 



t(p, <t>, 1, », S, E s , S c , n s , fi c , s, c\Y, x, V c , V s ) cx 
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(3) 



A Directed Acyclic Graph (DAG) is shown in Figure [2j The sub-models for the sources and TEFs, namely 
7r(/Li|, E||Z> S ) and 7t(/lx£, S||I> c ) can be updated as part of the modelling steps, but we prefer a more efficient 



empirical Bayes approach (Carlin and Louis 2000 1 whereby we approximate fif,, \x c kl Ti s k , by their sample 
estimates, thus cutting feedback and removing these from the updates and the posterior. Note that this has 
minimal effect on the source and TEF random effects Sik and which are still updated as part of the mod- 
elling process. The prior distributions we give to S and k& are vague Inverse- Wishart and Inverse-Gamma 
respectively. 



3 Previous work 



The earliest attempts at applying a mixing model framework to stable isotope data did not use probability 
as the basis for estimation. Initially, SIMMs were restricted to systems involving a single consumer (or 
the mean of multiple consumers), and where the number of isotopes and sources was arranged such that 
J + 1 = K. Such an arrangement yields a linear system with a single solution. Phillips and Gregg (20011 
provided propagation of error calculations for such a system in their IsoError model to establish confidence 
intervals around the estimates based on the variances of the consumer and source isotopic measurements. The 
method was expanded upon in IsoSource ( |Phillips and Gregg 2003 1 to relax the J + 1 = K restriction and 
allow for multiple sources but without explicit incorporation of source and consumer variability. Isosource 
works by simulating values of the dietary proportions p on a grid to produce multiple valid solutions to the 
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Figure 2: A Directed Acyclic Graph (DAG) of our model. Circles indicate parameters to be estimated whilst 
squares indicate data. The arrows indicate the direction of information flow. 



132 
133 
134 
135 
136 
137 
138 
139 
140 
141 
142 
143 
144 
145 
146 
147 
148 
149 
150 
151 
152 
153 
154 
155 
156 



linear system. The valid solutions could be plotted in a histogram-like fashion, though they did not represent 
probability distributions, rather simply the range of values which might be plausible given the geometry of 
the system. 



These initial attempts were formalised in a Bayesian fashion in the models MixSIR (Moore and Semmcns 



2008) and SIAR ( [Parnell et al. 2008). The MixSIR model can be thought of as a simplification of Equation 
I] without explicit random effects across the dietary proportions, and with X set to zero. The sources and 
TEFs are treated as independent across isotopes and are given fixed values for their mean and variance. 
The dietary proportions are given independent Beta-distributed priors or, in a later version, a Dirichlet 
distribution. Since the dietary proportions are the only parameters in the model, it can be fitted extremely 



efficiently using Importance Resampling (e.g. Robert and Casella 2005 1 on a grid encompassing the range 
of proportion values. Updated versions of the MixSIR model have included random effects in the dietary 
proportions through the clr transform, and also have allowed for hierarchical models to be fitted, most el- 



egantly in capturing familial relationships affecting the diet of gray wolves in British Columbia (Semmens 



et al. 2009 1. These latter advanced versions of MixSIR are fitted using the JAGS software (Plummer 2003) 



The SIAR model (Parnell et al. 2010) is in many ways similar to the basic MixSIR model (and thus still a 
simplification of Equation [Ij though includes a residual component which is treated as independent between 
isotopes and given an Inverse Gamma prior. The model also allows for concentration dependencies {qkj-> 



vectorised as q) which quantify the proportion of the isotope in the given food source (Phillips and Koch 
2002). They can be added in to our model by replacing p i in Equation [I] with C(p i q) where C is the 
simplex closure operator and is the simplex perturbation (Egozcue et al.||2003[). Usually q are either fixed 



or given a suitably informative prior distribution. The SIAR model is fitted using standard MCMC with 
Metropolis-Hastings steps to update the dietary proportions. 
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More recently, the IsotopeR model has been introduced which extends the SIAR/MixSIR models to a mul- 
tivariate setting (both sources and TEFs are multivariate normal) and partitions the residual covariance S 
into a mass spectrometer calibration error and that of residual error. They further allow for the sources, 
TEFs and dietary proportions to be random effects with the latter obtained through the clr transform (see 
next section for further discussion). The model is fitted in JAGS (Plummer 2003) but does not allow for 



covariate information or for the estimation of the dietary random effect variance. 

Aside from explicit SIMM model development, recent focus has also been on the performance of SIMMs in 
jion-ideal cond itions, most notably with respect to the characterisation of source and TEF values by [Bond 
|and Diamond (2011 1. Clearly it is absolutely vital that food sources are not excluded from the model as they 



will yield biased dietary proportions. Similarly, the estimation of the TEF values must be conducted appro- 
jpriately or there will be some extra uncertainty in the estimated dietary proportions. In related work, [Ward] 
|et al. (2011) considered the problem of (dis) aggregating sources and its effect on the resulting estimates. 
For example, if a consumer only eats part of a food source it may be hard to obtain isotopic values from 
just that part. Similarly if two sources, though different species, lie in the same location in iso-space it may 
be impossible for the model to determine the difference in their dietary consumptions. Thus on occasion it 
may be pertinent to aggregate sources without any loss of information. Alternatively the aggregation can be 
accomplished with fewer assumptions if it is calculated a posteriori by combining the negatively correlated 
dietary proportions. 

Lastly, it should be noted that there are strong connections between SIMMs and the 'end member' analysis 
used by geologists to determine the composition of e.g. river sediment. In such cases the sources are usually 
known with minimal error and the challenge is to estimate the proportional contributions of different sedi- 
ment sources. A number of different methods have been proposed, e.g. Soulsby et al. (20031; Brewer et al. 



(2011) (and references therein). Ours most closely resembles that of Palmer and Douglas (20081, though 
they use the air transformed proportions (see below for definition) which are given a spatial prior distribution. 



4 Statistical issues in SIMMs 
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The models we fit use ideas from Bayesian hierarchical modelling (e.g. Gelman et al. 20031 and composi 



tional data anaylsis (Aitchison 



1986). BHM is now part of the standard toolbox of Bayesian statistics and 
Of more interest however is the compositional structure applied to the 

A recent review 

of the state of compositional data analysis can be found in |Pawlowsky-Glahn and Buccianti (20111. Our 



we do not discuss them further here 
dietary proportions as this can strongly affect the behaviour of the posterior distribution 



models differ fundamentally from many standard problems as our compositions are not observed directly 
but are latent parameters to be estimated, constrained by their geometrical position in iso-space and the 
covariates upon which they may depend. 

The starting point for the early Bayesian SIMMs models was that of the Dirichlet distribution, being per- 
haps the simplest valid distribution on the simplex. The usual prior distributions used were either flat 
where all Dirichlet parameters arc set to 1 or the Jeffreys prior where all are set to 1/K. Unfortunately, as 



is well known (e.g. Aitchison, 19861, the Dirichlet distribution suffers from a very rigid sub-compositional 



independence assumption. This is not necessarily a problem when used as a prior distribution with fixed 
components as the posterior distribution may well show interesting sub-compositional properties. However, 
if dependence is to be modelled through hyper-parameters of the Dirichlet distribution (e.g. with covariates) 
this restriction will remain in the posterior. 



A number of extensions to the Dirichlet have been proposed (e.g. Wong 1998), but we focus here on the 



logistic-normal transformations of Aitchison (19861 and Egozcue et al. (20031, through which more flexible 



sub-compositional dependence can be obtained. The simplest of these is perhaps the additive log-ratio (air) , 
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where g(pj) in Equation [2] is set to be one of the chosen proportions (which is then removed from the com- 
position) and V — I. However, this can perform poorly when there is no obvious choice of denominator 
and is not permutation-invariant. The centred log-ratio (clr; defined in Equation [2] when V=I) removes 
the need for a choice of denominator in the log ratio but produces a covariate matrix of rank K — 1. It is 



also not sub-compositionally coherent (see Pawlowsky-Glahn and Buccianti 201 1| for further discussion of 
these terms). Finally the isometric log-ratio (ilr) uses orthonormal basis functions in the simplex to obtain 
coordinates that are isometric and satisfy the usual compositional requirements of coherence and permuta- 
tion/subcomposition invariance. The choice of basis functions is somewhat subjective; we follow the method 
in Egozcue et al. ( 2003 ) . From a Bayesian perspective, the latent compositional parameters are more easily 



5 identifiable when working with the ilr. 

6 

7 Once a suitable transform has been chosen, it is feasible to include covariates or perform any of the traditional 

s multivariate analysis techniques. Numerous examples can be found, including a geo-statistical framework 



J Tolosana-Delgado et al. 2011 1, discrete time series ( Barcelo-Vidal et al. 2011 1, or spectral methods (Pardo- 
Jguzquiza and Heredia 2011 1. A popular topic is that of zero compositions or zero-inflation (e.g. Butler and 



Glasbey 2008) which is not a severe issue in SIMMs because we know from experimental observation that 



all dietary sources are consumed. 



5 Case studies 
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We now present three case studies and show how our general model can be used in each of the scenarios. 
Jn the first case study we analyse the diet of a small sample of geese from data previously studied by |Inger 
STaL (2006). This first case study uses the model as proposed in Section |2j and can be seen as a small 
extension to that of Hopkins and Ferguson (20121. The second case study extends the geese model to allow 



for the inclusion of covariates or basis function models. Our final case study includes a compositional time 
series component where the sources and consumers are observed at different time points. In this final case 
study the data are swallows consuming chironomid midges, other freshwater invertebrates and terrestrial 
invertebrates. In all cases we run the models using the JAGS software and check convergence using the 



coda package ( 


Plummer et al. 


Brooks and German 




1998). 



1992 



37 
38 
39 
40 



42 
43 
44 
45 
46 
47 
48 
49 
50 
51 



5.1 Case study 1 

Our first case study contains 9 (8 13 C, S 15 N) pairs of isotopic values taken from the blood plasma of Brent 
geese sampled on 26th October 2003. The food sources are Zostera spp, terrestrial grasses, Ulva lactuca and 
Enteromorpha spp. Our empirical Bayes esti mates for the source s are given in Table[l] The TEFs were taken 



from values in the literature (see references in Inger et al. 
1 



and ££ 







2006 



for more details) so that fi'f. = (1.63, 3. 54) 1 
for all k. For a simple random effects formulation we set 7^ = over all i and k. An iso- 



space plot for the data (where the sources have been corrected by the TEFs) is shown in Figure [T] The JAGS 
model was run for 3 chains over 50,000 iterations, removing 10,000 for burn-in and thinning by a factor of 20. 

A density plot of the mean dietary proportions is shown in the left panel of Fi gure |3) They can be seen to 
compare favourably with the simpler SIAR model (see the function siardemo in Parnell et al. 2008 1, though 



in this case we have extra information covering individual dietary estimates, as well as improved estimation 
of the source and TEF random effects. In particular, the flexibility of the hierarchical formulation through 
the ilr transform allows for some multi-modality in the posterior distributions. The right panel of Figure [3] 
shows a matrix plot of the joint behaviour of the dietary proportions. These can be useful in determining 
unavoidable model inadequacy, for example when it is impossible to ascertain which food sources are being 
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Source 


Enteromorpha 




Terr Grasses 




Ulva lactuca 


Zostera 


Ms ( 


-14.06, 9.82) J 
" 1.37 
1.37 


( 


-30.88, 4.43)' J 
" 0.41 
5.15 


' ( 


-11.17, 11.2)' J 
" 3.83 0.85 " 
0.85 1.24 




(-11.17, 6.45) r 
1.48 -0.56 " 
-0.56 2.16 





Table 1 : Estimates of the source means and covariance matrices for the geese data of case study 1 . 



252 consumed together. A strong negative correlation indicates that the food sources are indistinguishable. For 

253 example the strong negative correlation between Zostera and Ulva lactuca indicates that, whilst it is clear 

254 they are consuming mainly Zostera, the balance between the two cannot be exactly determined. 

255 




Proportion 



Figure 3: Left panel: Density plot of mean dietary proportions for the geese data of case study 1. Right 
panel: matrix plot of the posterior dietary proportions obtained from the geese data. The upper-diagonal 
shows a contour plot, the diagonal a histogram, and the below-diagonal the correlation between the different 
sources. 

256 

257 There are many other useful statistics we can calculate here quite simply from the posterior distributions. 

258 In particular, we can focus on individual level variation by calculating, for example, the probability that an 

259 individual consumes more Zostera than another. Similarly with access to the variance parameters we can 

260 determine whether there is more variation amongst consumers in some sources rather than others. Lastly, it 

261 is often desirable to perform model comparison diagnostics to determine whether certain parts of the model 

262 can be removed without a detrimental effect on prediction. However, we do not perform any further analysis 

263 on this data set, preferring instead to use these tools when analysing the more sophisticated data below. 

264 

265 5.2 Case study 2 

266 We now extend our Goose model to a larger data set of 248 observations collected over the period of October 

267 2003 to April 2005 of which the previous case study was just a small part. The sources and TEFs are believed 
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268 to be stable over the course of the study so the source means and covariances are the same as Table [T] The 

269 diet of the geese will however vary during the season due to variations in abundance of food sources along 

270 with social and demographic factors. An iso-space plot for the full data is shown in Figure [3] The iso-space 

271 plot appears to show that the diet during October to be focussed mainly on Zostera, moving on to Viva 

272 lactuca and/or Enteromorpha during November /December. In January and February the diet appears to 

273 be relatively mixed, but focussing almost solely on Terrestrial grasses around April. In addition to the sta- 

274 ble isotope information we have covariates which state the Goose's sex and whether they are juvenile or adult. 




Figure 4: Isospace plot of the full goose data set where the consumers are labelled by month. The data in 
case study 1 correspond to the data shown for month 10. Note that the sources and the TEFs are unchanged 
from Figure [T] 

276 We consider 6 possible models for the dietary behaviour, accounting for the covariates. In each case we set 

277 lik = f3 k where X$ is an L-vector of covariates or basis functions for consumer i, and j3 k is an L-vector 

278 of regression parameters associated with (f> k . Note that, when using the ilr transform, the parameters (3 k 

279 do not have any association with source k. With the extra parameters in the model, convergence of the 

280 MCMC algorithm is a problem because the parameters are often highly correlated across sources, though 

281 this is somewhat lessened with appropriate choice of V in the ilr. We find it helpful to re-parameterise 

282 with Helmert contrasts across sources so that 7 a = Xj (3 ± and ~f ik = Xj (f3 1 + (3 k ) for k > 2. We use the 

283 first source (Enteromorpha) as k — 1 as this seems to be consumed throughout the season though this is 

284 obviously not known in advance so does involve some degree of trial and error. 

285 

286 In all cases the parameters /? are given vaguely informative N(0, 10) distributions as a value of |0| in excess 

287 of 10 is likely to yield dietary proportions near 100% (though again this depends on the correlation structure 



289 



288 the data set). We compare the different models using the Deviance Information Criterion (DIC; Spiegel- 
palter et al.||2002| and, for the final chosen model, posterior predictive distributions of the data. We use two 



290 versions of the DIC, the standard pr> method of Spiegelhalter et al.M2002b, and the py method of Plummcr 



29i (2008) which estimates the optimism as half the variance of the deviance, and thus penalises complex models 



292 more harshly. Due to the extra complexity in the models, we run them with 3 chains for 200,000 iterations, 
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293 removing 20,000 for burn-in and thinning by 90. 

294 

295 The first model we try involves no covariates and is thus the same as that in Section |5.1| The second model 

296 includes an intercept term and time as a simple linear covariate. The third model replaces linear time with 

297 a single harmonic component so that Xj = [l, cos (^gif ) , sin (t^)] where i, is the Julian day. The fourth, 

298 fifth and sixth models are expansions of model 3 to include juvenile/ adulthood (model 4), sex (model 5) and 

299 also their interaction (model 6) as covariates. Table [2] shows the different models and the associated DIC 

300 values. Both versions of DIC seem to prefer models with the harmonic covariate, and also the addition of 

301 cither sex or adulthood. 

302 

303 Figure [5] shows the relationship between Julian day and dietary proportion for the different sources for model 

304 4. The switch from Zostera, to Enteromorpha, to terrestrial grasses is very clearly seen in both juveniles 

305 and adults, though the uncertainty in juveniles is slightly higher, especially with respect to Enteromorpha. 

306 Figure [6] shows the posterior predictive distribution of the data under model 4. The model seems to pre- 

307 diet the data well, the three modes corresponding to the main sampling times of October, February and April. 



Model Covariate(s) DIC (using py) DIC (using pjj) 

1 Nohe 26907 1210.0 

2 Julian day (linear) 16957 524.3 

3 Julian day (harmonic) 16683 385.1 

4 Julian day (harmonic), Juvenile/ Adult 7551 393.6 

5 Julian day (harmonic), Sex 8600 382.8 

6 Julian day (harmonic), Juvenile/ Adult, Sex, Interaction 10812 382.9 



Table 2: Table of models and model selection criteria. The models with the lowest DIC are shown in bold. 



Juveniles 

Enteromorpha a Terrestrial Grasses Ulva Lactuca Zostera 




Adults 

Enteromorpha • Terrestrial Grasses Ulva Lactuca 



300 320 340 




Figure 5: Plot of proportion values against Julian day for model 3. The left panel shows estimates for 
juveniles whilst the right panel shows adults. The solid lines show median estimates, the outer of the 
polygons show 90% credibility intervals. The Geese appear to focus on Zostera around November before 
moving on to Enteromorpha and then terrestrial grasses. There is slightly more uncertainty in the juveniles 
than the adults. 
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Figure 6: Plot of the predictive distribution of the data from chosen model. The observations are shown as 
filled circles, whist the posterior predictive density is represented by the contours. 



5.3 Case study 3 

Our last case study concerns the dietary behaviour of barn swallows (unpublished data) . The data are stable 
isotope ratios of blood plasma samples from birds captured between May and August in 2009. Again, we 
use Julian day to determine their behaviour over time. However, in this scenario the sources (chironomid 
midges, other freshwater invertebrates, and terrestrial invertebrates) are also expected to change over time, 
and may have been observed on different days to that of the swallows. We thus expand our model to include 
a temporal component on the source vector as well as that of the dietary proportions. In each case, we use 



P-splines ( Eilers and Marx[ 1996 1 to explain the suitably flexible behaviour. Other time series methods, 
such as random walks or Levy processes, may also be appropriate. We write continuous time as fj so that 
the consumers are now Y{tj). We set 7fc(ij) = Xj (3 k where now Xj is a an L- vector of cubic B-spline basis 
functions evaluated at time point t\ and /3k are weights for each basis function on source k. The P-splinc 

20 formulation is completed by giving a random walk prior such that (3ik — Pi—i,k ~ ■^(0j r fc~ ) where is a 

21 roughness parameter associated with source k and given a weakly informative Ga(2, 1) prior. 

22 

23 The sources are now described by a multivariate spline model so that source data pairs, denoted s'j k (t) for the 

/r iT 

24 source experimental data at time t on isotope j and source k, are given TV I X (3 l7 . . . , X (3j , X (t) 

25 independently for each source k. The number of observations for each source is likely to be different, and 

26 certainly not equal to the number of consumer observations N. Here, the spline parameters 0^ determine 

27 the mean behaviour of the sources over time on isotope j. The J x J variance matrix £'(£) is also allowed to 

28 change over time with diagonal elements given log-splines: log(S^.(i)) ~ N(X T (3- S , k-s). The cross isotope 

29 covariance E' 12 is parameterised through a single correlation parameter for each source, denoted pk, and does 

30 not change over time. A spline could also be used here (for example on the arctangent of pk) but this was 

31 not found to improve the fit . The source spline model was run for each of the three sources in turn and used 

32 to calculate Maximum a Posteriori (MAP) estimates of and f° r a ^ times t at which consumer 
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333 data were available. An iso-space plot of the swallows data is shown in Figure [7] 

334 




Figure 7: Isospace plot of the swallows data. The source spline model has been run to obtain estimates of 
the source means and covariances throughout the study period. Arrows indicate the direction of movement 
over time for the sources and the swallows. The Julian day and the 50% standard ellipses are given for 
Julian days 150, 190 and 220. Note that the chironomids' 5 13 C values increase over time from the start of 
the study period. A similar occurrence can be seen in the terrestrial invertebrates. The data are shown as 
kernel-smoothed estimates of the swallows' isotope data again over Julian day, starting at 150 and ending 
on day 220. The swallows can be seen to also increase their <5 13 C values up until around day 180 whereupon 
the (5 13 C returns towards its original value. This plot should be read in conjunction with Figure [8] 

335 We use the source model predictions in our standard SIMM with the spline formulation on the proportions as 

336 outlined above. An alternative model where all parameters are estimated simultaneously would be possible, 

337 if rather slow. Instead, we retain the cutting feedback assumption so that the source and dietary proportions 

338 are run separately. For both the source and the dietary proportion models we run for 200,000 iterations, 

339 removing 20,000 for burn-in and thinning by 90. For both models, we use 25 knots, which seems to cover 

340 the flexibility in the data adequately. 

341 

342 Figure [8] shows the posterior dietary proportion estimates over Julian day for the swallows, predicted from 

343 the resulting spline parameter estimates. The results clearly indicate that they are feeding on mainly fresh 

344 water invertebrates during the early part of the data before concentrating on chironomids around the start of 

345 August. Figure [9] shows the predictive distribution of the data under this model. The fit appears satisfactory. 

346 



6 Discussion and future directions 

The SIMM formulation outlined in Section [2] allows for a rich framework upon which to include a variety 
of other statistical structures. The basis of such models is a mixture compositional structure applied to the 
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Figure 8: Plot of dietary proportion values against Julian day for the swallows data of Case Study 3. The 
solid lines show median estimates, whilst the filled polygons show 90% credibility intervals for each source. 
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Figure 9: Predictive distribution of the data for the swallows example. The observations are shown as filled 
circles, whist the posterior predictive density is represented by the contours. 



350 dietary sources. In the case studies above we have illustrated how some simple regression and smoothing 
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351 models might be included. The results seem useful and allow for many interesting findings which would not 

352 have otherwise been possible without, e.g. the time series or spline components. 



353 



354 The main challenges in such models are that the source and TEF values are fully and correctly characterised. 

355 It is a simple geometrical exercise to verify that if sources are missing from the data set, or that the TEF 

356 means and errors are poorly estimated, the dietary proportion estimates will be biased. This bias is com- 

357 pounded by the compositional nature of the problem where, if one dietary proportion is poorly estimated, 

358 then the others may well be too. There is no statistical test for missing sources; we rely on the ability of the 

359 ecologist to observe the system adequately. This may be particularly important if multiple organisms and 

360 sources are to be analysed simultaneously in a dietary network. A single missing source may cause biases 

361 across the dietary proportion estimates. Such models are not to be encouraged unless there is very strong 

362 evidence that no sources are missing and that TEFs are estimated correctly. 



363 



364 It is reasonably straightforward to expand our approach for other values of J where differing numbers of 

365 isotopes may be available for analysis. It should be noted, however, that if J > 3 it becomes harder to 
see determine model fit, especially as no obvious iso-space plot can be created. The solutions to this problem 

367 may lie in closer scrutiny of predictive distributions, and some subjective judgement over the size of the 

368 residual covariance matrix S. In such scenarios extra care is required in reporting the output of the SIMM. 



369 



370 There are further opportunities for expansion of the models. Some of these may include: 

371 • The simultaneous inclusion of multiple tissue varieties. It is often the case that different tissues are 

372 sampled on the same consumer as they are captured. These differentially replenished tissues will 

373 represent the dietary proportions consumed over different periods. For example, whilst blood plasma 

374 might represent the immediately sampled food sources, feathers might represent the diet consumed over 

375 the previous few months. A long-term data set where multiple tissues are analysed simultaneously may 

376 allow for increased precision in the dietary proportions. Alternatively, it may be possible to estimate 

377 the time scale over which the tissues are responding. 

378 • Clustering/Mixture models to determine groupings. If there are hidden groupings amongst the organ- 



380 



379 _ isms it may be possible to discern them using a model-based clustering approach (sensu Fraley and 



Raftery 20021. Even without such groupings, increased flexibility can be obtained by using mixtures 



381 of Gaussian distributions to model non-parametric behaviour. 

382 • Long-tailed multivariate distributions to account for outliers or small sample sizes. Clearly where 

383 sample sizes are small the multivariate Gaussian assumption (especially for sources) may be invalid, 

384 and thus heavier-tailed distributions may be required. One such which seems to be most easily fitted 



385 is the Multivariate Normal-Inverse Gaussian (MNIG Barndorff-Nielsen 1997) which has been used 

386 previously in clustering and financial settings. 

387 These are just three of the active areas of research to which SIMMs are being applied by our group. Many 

388 others arc likely to appear as the field advances. We hope to report soon on these exciting new developments. 

389 
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